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We present an analytically tractable model of Internet evolution at the level of Autonomous 
Systems (ASs). We call our model the multiclass preferential attachment (MPA) model. As its 
name suggests, it is based on preferential attachment. All of its parameters are measurable from 
available Internet topology data. Given the estimated values of these parameters, our analytic results 
predict a definitive set of statistics characterizing the AS topology structure. These statistics are not 
part of the model formulation. The MPA model thus closes the "measure-model- validate-predict" 
loop, and provides further evidence that preferential attachment is a driving force behind Internet 
evolution. 
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PACS numbers: 89.20.Hh; 89.75.Fb;89.75.Hc; 05.65.+b 



I. INTRODUCTION 

The Internet is a paradigmatic example of a complex 
network. Many researchers have used publicly available 
data on Internet topology and its observed evolution to 
test a variety of physical models of complex network 
structure and dynamics. The large-scale Internet topol- 
ogy represent the structure of connections between com- 
panies owing parts of the Internet infrastructure, each 
company roughly corresponding to an Autonomous Sys- 
tem (AS) An AS might be a transit Internet service 
provider (ISP) or customer, a content provider or sink, or 
any combination of these. Some ASs span multiple con- 
tinents and are highly interconnected, while others are 
present at a single geographical location and have only 
a few links. In 1999 Faloutsos et al. [2] observed that 
despite all this diversity, the distribution of AS degrees 
obeys a simple power law. This observation remains valid 
today, after ten years of Internet evolution Q • 

Many researchers have attempted to model the In- 
ternet as an evolving system |4l-[l7|. and have stud- 
ied its properties [l8l-l21|. However, questions regarding 
the main drivers behind Internet topology evolution re- 
main [22]. In this paper our main objective is to create 
an evolutionary model of the AS-level Internet topology 
that simultaneously: 

1. is realistic, 

2. is parsimonious, 

3. has all of its parameters measurable, 

4. is analytically tractable, and 

5. "closes the loop." 

Parsimony implies that the model should be as simple 
as possible, and, related to that, the number of its pa- 
rameters should be as small as possible. The fifth re- 
quirement means that if we substitute measured values of 
these parameters into analytic expressions of the model, 



then these expressions will yield results matching empiri- 
cal observations of the Internet. However, most critical is 
the third requirement [22j : as soon as a model has even a 
few unmeasurable parameters, one can freely tune them 
to match observations. Such parameter tweaking to fit 
the data may create an illusion that the model "closes the 
loop," but in the end it inevitably diminishes the value 
of the model because there is no rigorous way to tell why 
one model of this sort is better than another since they 
all match observations. To the best of our knowledge, the 
multiclass preferential attachment (MPA) model that we 
propose and analyze in this paper is the first model sat- 
isfying all five requirements listed above. 

A salient characteristic of our model is that we dis- 
tinguish between two kinds of ASs: ISPs and non-ISPs. 
The main difference between these two types of ASs is 
that while both ISPs and non-ISPs can connect to ISPs, 
no new AS will connect to an existing non-ISP since the 
latter does not provide transit Internet connectivity. In 
Section |H] we analyze the effect of this distinction on the 
degree distribution. In Section IIIII we account for other 
processes. ISPs can form peering links to exchange traffic 
bilaterally. They can also go bankrupt and be acquired 
by others. Finally, they can multihome, i.e., connect to 
multiple providers. Prior work has often focused on these 
processes as the driving forces behind Internet topology 
evolution. We show that in reality they have relatively 
little effect on the degree distribution. Using the best 
available Internet topology data, we measure the param- 
eters reflecting all the process above, and analytically 
study how they affect the degree distribution. 

However, the degree distribution alone does not fully 
capture the properties of the Internet AS graph [23j | . 
The dK-sevies formalism introduced in [23[ defines 
a systematic basis of higher-order degree distribu- 
tions/correlations. The first-order (IK) degree distribu- 
tion reduces to a traditional degree distribution. The 
second-order (2K ) distribution is the joint degree distri- 
bution, i.e., the correlation of degrees of connected nodes. 
The distributions can be further extended to account for 
higher-order correlations [23j . or for different types of 
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nodes and links, called annotations (24J. In the economic 
AS Internet, there are two types of links: links connecting 
customer ASs to their providers (customer-to-provider 
links), and links connecting ISPs to their peers (peer-to- 
peer links). Reproducing the 2K- annotated distribution 
of AS topologies suffices to accurately capture virtually 
all important topology metrics [23l . |24| . An important 
feature of the MPA model is that, by construction, it 
naturally annotates the links between ASs by their busi- 
ness relationships, and that we can analytically calculate, 
in Section [IV! the distributions for the number of peers, 
customers, and providers that nodes have. In Section IVl 
we perform a 2iT-annotatcd test. We generate synthetic 
graphs using the MPA model, and find that these graphs 
exhibit a startling similarity to the observed AS topology 
according to almost all the 2X-annotated statistics. This 
validation, in conjunction with observations in [23l. l24j|. 
ensures that other important topology metrics also match 
well. 



II. TWO-CLASS PREFERENTIAL 
ATTACHMENT MODEL 

In the original preferential attachment (PA) 



model [25[, there is only one type of nodes. Sup- 
pose that nodes arrive in the system at the rate of 
one node per unit time, and let them be numbered 
s = 1,2,3, ... as they arrive. Then the number of nodes 
in the system at time t is equal to t. A node entering 
the system brings a link with it. One end of the link 
is already connected to the entering node, while the 
other end is loose. We call such an un-associated end 
a loose connection. According to the PA model, nodes 
attach to existing ones with a probability proportional 
to their degrees. Thus, the probability that a node of 
degree k is selected as a target for the incoming loose 
connection, is its degree divided by the total number 
of existing connections in the system, which is =|. The 
original PA model yields a power-law degree distribution 
P(k) ~ k^ 1 with 7 = 3, but the linear preference 
function can be modified by an additive term such that 
the model produces power laws with any 7 > 2 (25W27I ] . 

In the Internet, there are two fundamentally different 
types of ASs — ISPs and non-ISPs — that differ in whether 
they provide traffic carriage between the ASs they con- 
nect or not. No new AS would connect to an existing 
non-ISP since it cannot provide Internet connectivity. 
No other work has attempted to model this observation, 
which is fundamental to understanding the evolution of 
Internet AS-level topology. Thus, our first modification 
of the PA model is to consider these two classes of nodes 
(Figure rjj. New ISPs appear at a rate 1 and connect to 
other ISP-nodes with a linear preference. New non-ISPs 
appear at some rate p per unit time, implying that the 
ratio r of non-ISP nodes to the total number of nodes is 

P 
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FIG. 1: Two-class preferential attachment model. There are 
ISP nodes and non-ISP nodes. 



Non-ISPs attach to existing ISP-nodes with a linear pref- 
erence. However, no further attachments to non-ISPs can 
occur. Thus, with respect to degree distribution, only the 
ISP-nodes contribute to the tail of the power law as the 
non-ISP nodes will all have degree 1. Given that a link 
between an ISP node and a non-ISP node has only one 
end that contributes to the degree of ISP nodes, and since 
we look only at ISP nodes to find the degree distribution, 
the links that have an ISP node on one end and a non- 
ISP node on the other should be counted only once, i.e., 
they are counted as contributing 1 connection. 

Since each ISP node contributes 2 connections to the 
network and a non-ISP node contributes only 1, the total 
number of existing connections in the network at time t 
is (2 + p)t, which implies that the probability of any loose 
connection connecting to an ISP-node of degree k is 



(2+p)i 



(1) 



We use the notation p(k, s, t) to denote the probability 
that an ISP node s has degree k at time t. Then the 
average degree of an ISP node s at time t is 



= ^2kp(k,s,t) 



(2) 



Both entering ISPs and non-ISPs have one loose connec- 
tion each, so the number of loose connections entering the 
system at time t is 1 + p. Then, from (fTJ , the continuous- 
time model of the system is 



dk(s,t) 
dt 



1 



(2 + p)t 



k(s,t), 



with boundary condition k(t,t) = 1 for t > 1. 
this equation yields 



(3) 



Solving 



k(s,t) 



i+p 

" 2 + p 



(4) 



This model represents a deterministic system in which if 
ISP node s has degree k, then ISP nodes that arrived be- 
fore s (in the interval [0, s)) have degree at least k. Thus, 
s also represents the number of ISP nodes that have de- 
gree at least k. It follows from (fj| that the number of 



3 



ISP nodes that have degree k or higher is 



t 



- 2 + p 
fcT+T 



Note that the number of ISP nodes that arrive in [0, t] is 
just t. Hence, the fraction of ISP nodes that have degree 
k or higher is 

1 



- 2 + p 
fcl + P 



Since this fraction is essentially a complimentary cumu- 
lative distribution function (CCDF), we differentiate it 
and multiply by —1 to obtain the density function 



f(k) = 

which corresponds to the probability distribution 

P(k) ~ fc-( 2+ TT?) = fc- (3 ~ r) . 



(5) 



(6) 



Validation Against Observed Topology 1 

Dimitropoulos et al. \28l l applied machine learning 
tools to the best available data from the Internet registry 
(WHOIS) and routing (Border Gateway Protocol (BGP)) 
systems to classify ASs into several different classes, 
such as Tier-1 ISPs, Tier-2 ISPs, Internet exchange 
points, universities, customer ASs, and so on. They 
validated the resulting taxonomy by direct examination 
of a large number of ASs. We use their results and 
divide ASs into two classes based on whether they are 
ISPs or not. According to p3 / (the dataset is available at 
\http:// www. caida. org/ data/ active/ as_ taxonomy/ ), 
the number of ISPs is about 30% of ASs, while non-ISPs 
make up the other 70%, i.e., r = 0.7 and p = 7/3. 
The measured value of p yields P(k) ~ k~ 2 3 , with 
exponent close to the observed value between —2.1 and 

-2.2 n 03, \M]- 

The key point of this section is that the observed value 
of the power-law exponent close to 2 finds a natural and 
simple explanation: it is due to preferential attachment 
and to a directly measured high proportion of non-ISP 
nodes to which newly appearing nodes cannot connect. 



III. MULTICLASS PREFERENTIAL 
ATTACHMENT: PEERING, BANKRUPTCY, 
MULTIHOMING, AND GEOGRAPHY 

In this section we add further refinements to our model 
and show that, contrary to common beliefs, none of these 
refinements have a significant impact on the degree dis- 
tribution shape. 

Relationships between ASs change over time, as ASs 
pursue cost-saving measures. If the magnitude of traf- 
fic flow between two ISPs is similar in both directions, 
then reciprocal peering with each other allows each ISP 
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FIG. 2: The multiclass preferential attachment model. 



to reduce its transit costs. Under the assumption that 
all customer ASs generate similar volumes of traffic, high 
degree ASs would exchange high traffic volume and ra- 
tionally seek to establish reciprocal peering with other 
high degree ASs. We denote the rate at which peering 
links appear by c. The probability that a new peering 
link becomes attached to a pair of ISP-nodes of degree 
k\ and ki is proportional to k\k 2 - 

When ISPs go bankrupt, their infrastructure is usually 
acquired by another ISP, which then either merges the 
ASs or forms a "sibling" relationship in which their rout- 
ing domains appear independent but are controlled by 
one umbrella organization. Thus, in terms of the topol- 
ogy, bankruptcy means that a connection shifts from one 
ISP to another. Since high degree ISPs tend to be wealth- 
ier, they are more likely to be involved in such takeovers. 
We denote the rate of bankruptcy by p per unit time. 

A growing AS may decide to multihome, i.e., to con- 
nect to at least two Internet providers. One would expect 
that higher degree ISPs with a need for reliability would 
multihome to other higher degree ISPs. We model this 
phenomenon by assuming that multihoming links appear 
in the system at rate v per unit time. The probability 
that a new link becomes attached to a pair of ISP-nodes 
of degree k\ and k 2 is proportional to kik 2 . The links 
are directed from the customer to the provider, and we 
assume that the higher degree ISP is the provider. We 
also assume that non-ISPs multihome to an average of m 
providers each. The model is illustrated in Figure O 

We analyze this complete MPA model, using tech- 
niques similar to that of Section [TTJ The relation we 
obtain for fc(s, t) is 



where 



MM) =(|)"% 2. 

1 + 2is + mp + 2c + mp 



(7) 



(8) 



2 + 2v + mp + 2c 
Proceeding in a manner identical to Section \H\ yields the 



relation 



p{k) ~ r 



(9) 
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where 



7 = 



1 = 2 



1 



jti 



1 + 2f + mp + 2c + p 



(10) 



Validation Against Observed Topology 2 

We used the annotated Route Views data \31 
from fSM] (the dataset is available at 
http : //www. caida. org/ data/ active/ as_ taxonomy/ ) 

in order to obtain the empirical distribution of number 
of ISPs to which ASs multihome. We find that the 
average number of providers that ISPs connect to is 
2, meaning that v = 1. Indeed, since ISPs arrive at 
rate 1 and choose one provider initially, multihoming 
links entering at rate v — 1 yield the average number of 
providers per ISP of 2. The average number of providers 
that non-ISPs multihome to is 1.86, i.e., m = 1.86. 
Dimitropoulos et. al |H/ also showed that roughly 90% 
of links are of customer-provider type, i.e., these links 
pertain to transit relationships, with payments always 
going to the provider ISP. They find (a lower bound of) 
10% of links are peering, i.e., these links correspond to 
bilateral traffic exchange without payment. In the model, 
customer links appear in the system at a rate of 1 + mp. 
We thus calculate c = (1 + v + mp)/9 = 0.704 peering 
links per unit time. The authors of [Si] also estimate 
that the fraction of sibling links is too small to measure 
accurately and we take p = 0. Substituting these values 
into the exponent expression ilO\) results in 7 = 2.114 
that matches the observed values lying between 2.1 and 
2.2 fi, \M, M]- 

The key point of this section is that the large ratio of 
non-ISPs to ISPs p is the dominating term in determining 
the value of 7, bringing it down from 3 to 2.3, while all 
the other parameters are less significant, decreasing 7 
further from 2.3 to 2.1, its observed value. However, the 
extensions considered in this section do strongly affect 
other network properties, such as clustering, as we will 
see in Section fVl 

Our last comment in this section concerns geography. 
We could divide the world into different geographical re- 
gions, each growing at a different rate. Due to the self- 
similar nature of scale-free topologies [33j], the resulting 
graph would still bear identical properties to the MPA 
model as long as the parameters p, c, and p are the same 
in all regions. Evidence supporting this hypothesis is 
available in [3(J [HJ , where Chinese or European parts of 
the Internet are shown to have properties virtually iden- 
tical, after proper rescaling, to the global AS topology. 



IV. PEER, CUSTOMER, AND PROVIDER 
DISTRIBUTIONS 



In this section we calculate the distributions for the 
number of peers, customers, and providers that ISPs have 
in the MPA model. 

The first two distributions are power laws with the ex- 
ponents equal to 7, the exponent of the overall degree 



distribution. To see this, we first focus on the peer dis- 
tribution. We denote the average number of peers that 
an ISP arrived at time s has at time t by (,{s,t). The 
dynamics of peer link formation is given by 



d~g s ,t) 

dt 



2c 



(2 + 2v + mp + 2c)t 



k(s,t), 



(11) 



since 2c is the rate of arrival of loose peer connections, 
which attach to target nodes with probability propor- 
tional to the target node degree. If we define 



P 



2c 



{2 + 2v + mp + 2c) ' 



so that, 



d((s,t) p 



dt 



t 



k(s,t), 



(12) 



then after the substitution of k(s, t) from ([7]) with p = 0, 
we have 



dt(s,t) _ P 



dt 



We then solve the above for ((s, t): 

P 



(13) 



(14) 



Thus, the number s of ISP nodes that have £ or more 
peers is 



t 



(fCM) 



(15) 



Dividing the right side by t (the total number of ISP 
nodes in the system at time t) gives the cumulative dis- 
tribution of the average number of peers of ISPs. Differ- 
entiating and multiplying by —1 yields the distribution 



/(C) 



(16) 



which corresponds to P(Q ~ C~ 7 with the same power 
law exponent 7 = — + 1 as in (fTU|) . We can show in 
an identical fashion that the distribution for the number 
of customers that ISPs have follows the same power law 
distribution. We will check the validity of these results 
in Section [V] 

We next show that the distribution for the number of 
providers that ISPs have is a random variable 1 + X, 
where X approximately follows an exponential distribu- 
tion with parameter v. According to the model, multi- 
homing links between ISPs are directed from the lower to 
higher degree ISP. Although the probability that the mul- 
tihoming link connects to a particular ISP is proportional 
to the ISP's degree, the probability that the chosen ISP 
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FIG. 3: Provider distribution. 



is the customer end of the multihoming link is inversely 
proportional to its degree. We can thus expect the prob- 
ability that any particular ISP obtains an extra provider 
at any time to be approximately the same. Since upon its 
arrival each ISP always chooses a provider, the minimum 
number of providers of ISPs is 1. If we now denote by 
1 +p(s, t) the average number of providers that ISPs, ap- 
peared at time s, have at time t, then p(s, t) is a solution 
of the following equation 

dp(s, t) v 
dt = 1' 

because the rate of arrival of multihoming links is v. 
Solving the above expression with the initial condition 
p(s, s) — 0, we obtain 



(17) 



te~. 



(18) 



Since the number of ISPs at time t is just t, this expres- 
sion implies that the distribution of the number of ISPs 
that havep or more multihoming links is exponential . 
Since all ISPs (except the first) have at least one provider, 
the distribution for the number of providers that ISPs 
have should approximately be a random variable 1 + X , 
where X is exponentially distributed with parameter v. 
Given that the argument above is rather heuristic, we 
can hardly expect this distribution in the real Internet 
to be exactly exponential. However, the most important 
consequence of this argument is that this distribution is 
quite unlikely to be heavy-tailed. 

Validation Against Observed Topology 3 Using 
the same data from the complementary cumula- 

tive distribution function ( CCDF) for the number of 
providers that ISPs have ( after subtracting the one initial 
provider) versus the ISP degree is shown in Figure 
plotted in the semi-log scale. The exponential curve fit to 
the initial part of the graph has a slope of —0.7, i.e., the 
average number of providers is 1 + 1/0.7 = 2.4, which 
is close to our empirically measured mean value of 2 in 
Validation 2. 

The purpose of this validation is to show not that the 
distribution is exactly of the form 1 + X where X ~ 
exp(l/z/), but that it is definitely not a power law. 



V. MODEL VALIDATION BY SIMULATION 

We have developed a model that describes the evolu- 
tion of the AS-level topology and validated the analytical 
results using measured parameters. We now simulate the 
MPA model using all of the measured parameters. 

The MPA model generates annotated graphs, with 
links connecting either customers to providers (c2p links) 
or peers to peers (p2p links). Therefore the total node 
degree is a sum of the degrees of three types — the num- 
bers of customers, providers, and peers attached to a 
node. Dimitropoulos et al. [24| have shown that the 
2A'-annotated distribution of the Internet essentially de- 
fines its structure. In other words, if one randomizes the 
Internet preserving its 2if-annotated distribution, then 
the randomized topologies will be almost identical to the 
original Internet topology. The 2A'-annotated distribu- 
tion is a generalization of the joint degree distribution 
for graphs with links annotated by their types. These 
types can be abstracted by colors, and the traditional 
scalar node degree becomes a vector of colors specifying 
how many links colored by what colors are attached to 
the node. In the Internet case, these colors are customer, 
provider, and peer. The 2A'-annotated distribution is 
then the joint distribution for the vectors of colored de- 
grees of nodes connected by differently colored links. 

Given the findings in [24| . in order to show that 
our model reproduces the Internet structure, it suf- 
fices to compare the 2A'-annotated distributions in sim- 
ulated networks and the Internet. Unfortunately, the 
2A'-annotated distribution is too multi-dimensional and 
sparse. Therefore, we can work only with its projections. 
The reasonable and informative projections include [24| : 
(i) the degree distribution (DD): the traditional distri- 
bution of total node degrees, i.e., the number of links of 
all types attached to a node; (ii) the annotated distri- 
butions (ADs): the distributions of the number of cus- 
tomers, providers, and peers that nodes have, i.e., the 
distribution of degrees of each type; (iii) the annotated 
degree distribution (ADD): the joint distribution of cus- 
tomers, providers, and peers of nodes, measuring the 
per-node correlations among these three degree types; 
and (iv) the joint degree distributions (JDDs): the JDDs 
measure the correlations of total node degrees "across the 
links" of different types. 

Therefore, in our validation we compare all 
these metrics between the graphs that the 
MPA-model produces and the Internet topol- 
ogy annotated with AS relationships using [32j 
on Jan 1, 2007. The specific dataset used is 



http : //as-rank . ca ida . o rg/data/2007/as -rel . 20070101 . aO . 01 
which is a part of [35| providing publicly available weekly 
snapshots of the annotated Internet. These snapshots 
are based on the Route Views BGP data [3l|. The values 
of the parameters we use in our simulations are p = 2.3, 
v = 1, c = 0.704, and m = 1.86, which we recall are the 
ratio of the numbers of non-ISPs to ISPs, ratio of ISP 
multihoming links to ISPs, ratio of peering links to ISPs, 
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FIG. 4: Validation of the MPA model by simulation. 
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FIG. 5: Average neighbor degree and clustering in simulations 
vs. real data. 



and the average number of providers to which non-ISPs 
multihome, respectively. We run the simulation with 
deterministic link arrivals based on their arrival rates. 
The total number of ISP and non-ISP nodes are 7, 200 
and 16, 800. These are the numbers of nodes that we 
were able to classify in the dataset using [2^|. We do 
not model bankruptcy since as mentioned earlier, the 
rate at which it occurs is too small to get an accurate 
estimate of our bankruptcy ratio /i. 



Degree Distribution (DD): Figure 4(a) shows the 



DD of the graphs generated by the MPA model, and its 
comparison with the observed topology. As predicted in 
the Section IIIII the MPA model produces a power law 
DD, and the exponent of the CCDF matches well the 
BGP data. 



Annotated Distributions (AD): Figures 4(b) 4(d) 



show the ADs generated by the MPA model. We compare 
the customer, peer, and provider degree distributions of 
the simulated graph with that of the BGP tables. As 
predicted, the ADs of number of customers and peers 
are both power law graphs with the same exponent as 
the DD. 

We plot the CCDF of the number of providers that 
ISPs multihome to (on linear x-axis and logarithmic y- 
axis) in Figure |4(d)| They are approximately of form 
1 + X , where X is exponentially distributed. The curves 
show a discrepancy in slope. We believe that it arises 
due to the fact that almost all the distribution mass is 
concentrated at small degrees, as the mean is 2, and the 
number of ISPs with high multihoming degree is small. 

Annotated Degree Distribution (ADD): Each 
ISP has some numbers of providers, peers, and cus- 
tomers. The ADD is the joint distribution of these num- 
bers across all observed ISPs. We illustrate these correla- 
tions in Figures 4(e) and |4(f)| To construct these plots, 
we first bin the ISP nodes by the number of providers 
that they have (the x-axis), and then compute the aver- 
age number of customers or peers that the ISPs in each 
bin have (the y-axis). We observe that the MPA model 
approximately matches the BGP data against these met- 
rics as well. 

Joint Degree Distributions (JDDs): While the 
ADD contains information about the correlations be- 
tween the numbers of different types of nodes connected 
to an ISP, it does not reveal information about the de- 
gree correlations between the parameters of different ISPs 



connected to each other, i.e., whether higher degree ISPs 
are more likely to peer with each other, etc. This infor- 
mation is contained in the average neighbor connectivity, 
which is a summary statistic of the joint degree distribu- 
tions in Figures 4(g) and |4(h)| Specifically, let the prob- 
ability that a node of degree k has a c2p link to a node 
of degree k' be called P C 2p(k'\k). Then the average de- 
gree of the provider ISPs of ISPs that have degree k is 
k~c2 P (k) = J2'k k'P C 2 P (k'\k). In a full mesh graph with n 
nodes and undirected links, since all nodes have degree 
n — 1, the value of this coefficient is simply n — 1. We 
show the normalized value k C 2 P (k)/(n — 1) in Figure 4(g) 



The similarly normalized values of k P 2 P (k) are shown in 
Figure 4(h) | These functions exhibit similar behaviors 
for the MPA model and BGP data. 

Coupled with observations in [24j that the real Internet 
topology is accurately captured by its 2if-annotated dis- 
tribution, the results in this section provide evidence that 
the MPA model reproduces closely the Internet AS-level 
topology across a wide range of metrics. As an example, 
we show in Figure [5] two standard topology metrics: the 
average neighbor degree and clustering as functions of 
the total node degree. We observe that even though two- 
class preferential attachment in Section [TT] produces tree 
networks, its multiclass extensions in Section [TTTT imple- 
mented in our simulations, closely reproduce clustering 
observed in the real Internet, which is a consequence of 
the 2K- annotated randomness of the Internet, and the 
2 JT- annotated distribution match in Figure 2] Finally, 
the average total node degree in the real Internet and 
simulations is 4.1 and 4.2 respectively. 



VI. CONCLUSION 

We constructed a realistic and analytically tractable 
model of the Internet AS topology evolution that we 
call the multiclass preferential attachment (MPA) model. 
The MPA model is based on preferential attachment, and 
we believe it uses the minimum number of measurable 
parameters altering standard preferential attachment to 
produce annotated topologies that are remarkably sim- 
ilar to the real AS Internet topology. Each model pa- 
rameter reflects a realistic aspect of AS dynamics. We 
measure all parameters using publicly available AS topol- 
ogy data, substitute them in our derived analytic expres- 
sions for the model, and find that it produces topologies 
that match observed ones against a definitive set of net- 
work topology characteristics. These characteristics are 
projections of the second-order degree correlations, an- 
notated with AS business relationships. Matching them 
ensures that synthetic AS topologies match the real one 
according to all other important metrics [23|, [24| . 

The model parameter that has the most noticeable ef- 
fect on the properties of generated topologies reflects the 
ratio of ISP to non-ISP ASs. Contrary to common beliefs, 
the parameters taking care of AS peering, bankruptcies, 
multihoming, etc., are less important, as far the degree 
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distribution is concerned, although they do affect other 
network properties, such as clustering. No other param- 
eters or complicated mechanisms appear to be needed to 
explain the Internet topology annotated with AS business 
relationships. In other words, preferential attachment, 
with the MPA modifications, appears to explain the com- 
plexity of the AS-level Internet abstracted as an anno- 
tated graph. An interesting open question concerns the 
origins of preferential attachment in the Internet. Given 
that the vast majority of AS links connect customer ASs 
to their providers |32|, this question reduces to finding 
how customers select their providers. The popularity of 



providers, their "brand names," may be an important 
factor explaining the preferential attachment mechanism 
acting in the Internet. 
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